THE PROBABILITY OF FINDING A FIXED PATTERN IN RANDOM 
DATA DEPENDS MONOTONICALLY ON THE BIFIX INDICATOR. 



ALEX SCHREIBER 



Abstract. We consider the problem of finding a fixed L-ary sequence in a stream of random 
L-ary data. It is known that the expected search time is a strictly increasing function of the 
lengths of the bifices of the pattern. In this paper we prove the related statement that the 
probability of finding the pattern in a finite random word is a strictly decreasing function 
of the lengths of the bifices of the pattern. 

1. Problem statement 

Definition 1 (Bifix indicators). Let b = (&i,&2, • • • ,b n ) be a word of length n > 2 of the 
alphabet {0, L — 1}, L > 2. We define the bifix indicator h = (hi, ... , fo n -i) of b as a 
binary word of length n — 1 such that 



hi 



1 if(bx, . . . ,6j) = (6 n _ i+ i, • • • A) 
otherwise. 



Under h < h' we will understand hi < h\ for all 1 < i < n — 1 and under h < h! that both 
h<ti and /i ^ h'. 

By speaking of a random word (of finite length) or a random sequence (of infinite length), 
here and elsewhere in this paper we mean a word or sequence with the members being 
selected independently with uniform probability 4 of the alphabet. 

Definition 2 (Probabilities P k and p k ). Fixing L, a word b and the integer k, we let P k 
be the probability that the word b appears (at least once) as substring in a random word of 
length k. We let p k be the probability that b appears exactly once, namely at the end in a 
random word of length k. 

One has P k = Y^i=iPi because if the searched substring appears at least once then there 
is an index < i < k when it appears first. Obviously, it holds P k = and p k = for 
< k < n-1. 

One can interpret those probabilities also in the situation that one is looking for b in 
an infinite sequence (di, g^j ■ • ■ ) where p k is the probability for waiting time k, i.e. the 
probability that b appears for the first time in (di, . . . , d k ), i.e. b = (d k - n +i, • • • , d k ). 

The aim of this paper is to proof the following theorem. 

Theorem 3. Let b and b' be two words of length n of the alphabet {0, L — 1}. Let hi and 

h\ be their bifix indicators and P k and P' k the corresponding probabilities as defined before. 

We claim that 
a) if h = h! then P k = P' k for any k > and that 
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b) if h < h' then P k > P k for k > ko where 

ko := n + min{l < i < n — l|/ij = and = 1}. 
For <k <k it holds P k = P' k . 

Note that part a) is easy and well-known. Our interest lies in part b). In his PhD thesis 
[3], in 2010, Pavol Hanus states a special version of part b) of theorem[3l namely for the case 
that h = (0, ... ,0), without giving a complete proof. I am grateful to Alexander Mathis 
who pointed out this subject and reference to me. 

Peter Nielsen gave in 0], in 1973, a formula for the expectation of the waiting time until 
the first occurence of b in a random word, the right-hand side of 

oo n— 1 

^2p i i = L n + ^2h i L i . 

i=\ i=l 

This formula is actually a bit different from the one in [I] because of deviations in our 
definitions. What we call P k would be Pk- n for Nielsen. Thus he would also substract n from 
the right-hand side. Note that this formula quantizes the fact that the expected search time 
is an increasing function of the bifix pattern. If one were just interested in this qualitative 
fact, this would follow independently from our paper as one has 

oo oo 

5>* = ^(i-p fe ). 

i=l k=0 

In order to prove our theorem it would be desirable to calculate an explicit formula for 
the value of P k like the one for the expected search time but we believe that there is none. 
At least there is no affine dependence of p k (or, equivalently, P k ) on h as there is for the 
expectation of the waiting time. As an example, let L = 2 and consider for b the words 

b 1 = (1,0,0,0,0),£> 2 = (l,0,0,0,l),fe 3 = (1, 0,0, 1,0), ft 4 = (1,1,0,1,1). 

Then one has the corresponding bifix patterns 

h 1 = (0, 0, 0, 0), h 2 = (1, 0, 0, 0), h 3 = (0, 1, 0, 0), h 4 = (1, 1, 0, 0), 

consequently also (componentwise) h 1 + h 4 = h 2 + h 3 , but in general P k + P k ^ P k + P k , 
e.g. for k = 12, as we calculated with the help of a computer. 

However, recursive formulas are known for p k and P k . In order to get explicit formulas, 
one would probably use those recursions and try to derive closed expressions from them. 
For that purpose, one would have to solve polynomial equations of degree n. As mentioned 
before, we do not believe it to be feasible to find explicit formulas. 

Instead, we will show that the same recursions are obtained for certain Markov chains. 
Then we will compare the probabilities for the Markov chains, obtaining in that way a proof 
for the main theorem. 

2. Recursive formulas for the probabilities 

The probability p k amounts to the probability of ocurrence of the search pattern at 
position k reduced by the probability of an additional ocurrence at an earlier time. Paying 
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attention to the overlap structure given by the bifix indicator h of the word 6, one gets the 
formula 



j fc— n n—1 

(1) Pfc = 7^ - J^Pi - h *T^iPk-n+i for any k > n, 

i=n i=l 

that appeared in [2], in 2005. By knowing that Pi = • • • = p n -i — the formula allows to 
calculate all the p k . Among other things, this shows that the p k depend only on h, not on 
more information of b, as claimed by part a) of the theorem. By taking the difference of the 
equation with the shifted equation where k is replaced by k + 1 one gets the following linear 
non-homogeneous n + 1-term recursion for p k which was already given in [TJ, in 1995: 



(2) p k+1 =p k - — Pk+i-n ~ ^2 h iJJ^i(Pk-n+i+i - Pfc-n+i) for any k > n 

i=i 

1 

(3) Pi = ■ ■ ■ =Pn-l = 0,p n = — . 

By summing up for k — n, n + 1, . . . , K, but afterwards substituting k for K, and noting 
that p n = one gets 



1 1 n-i l 

(4) P k+1 = — + P k - — P k+1 - n - h ij^( p k-n+i+i ~ Pk-n+i) for any k > ; 

i=i 

(5) P 1 = ... = p n _ 1 = 0,p n = ±. 



3. Markov chains for our problem 

In the following, we will turn our focus towards Markov chains. The motivation is that, 
given a fixed word b and semi-infinite random data d — (di, d?, . . . ) we are looking for the first 
appearence of b in d, to that end observing d "from left to right" . Our first idea, that was not 
entirely conducive, was to associate to b a Markov chain X(b) = (Xi, X 2 , . . . ) whose state X k 
would measure how good the chances are to encounter the subword b in (di, . . . , d k ) or some 
more letters of d. More specifically, X k = n would mean that (di, . . . ,d k ) contains b and 
otherwise X k would be the greatest i (0 < % < n — 1) such that (b\, . . . h) = (dk-i+i, ■ ■ ■ , d k ). 
The larger this i the better would be the chances to find an instance of b soon. 

We want to prove a validity of a formula which uses only the bifix indicator of a search 
string b, not b itself. However, there could be two different words b 1 and b 2 representing the 
same bifix class h but, according to the construction described beforehand, having different 
associated Markov chains X(b l ) and X(b 2 ). 

For that reason, we will associate in an improved ansatz a Markov chain X{h) to each bifix 
class h such that, given a bifix class h instead of a word b, one can often (but not always) 
find a representing word b for h such that X(h) = X(b). 

This Markov chain will have the probability that P k = Pr(X^ = n) whenever P k is defined 
as in the last section for a word b and the Markov chain is the associated one to the bifix 
class h of b. 
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Definition 4. We consider n-ary words s = (sq, . . . , s n -i) with 



(6) < Si < i forO < i < n- 1. 

To such a word we associate the stationary Markov chain X(s) = (Xq,Xi,...) where 
Xq,Xi,... are random variables which map to the set {0,1,..., n} with initial condition 
X Q = (a.s.) and the transition probability pij = Pr(Xk+i = j\Xk = i) from state i to state 
j given by 



Pij 



Furthermore, we define 



1 if i = j = n 

if % < n — 1 , % + 1 = j 
if i < n - 1, Si = j > 
if i < n — 1, Si > j = 
ifi < n - 1, Si = j = 
otherwise. 



Pr(X fc = n) 



i_ 

L 
j_ 

L 

L-2 

L 
L-l 

L 



It will turn out soon that these numbers are related to those defined in the preceding 
section, therefore justifying our notation. 

The initial condition and the transition probabilities determine the joint distribution of 
the Markov chain uniquely. The interpretation is that in each step with probability the 
transition is to state 0, with probability j the transition is from state i to state Sj, and with 
probability \ the transition is from state % to state i + Whenever the numbers and s(i) 
are equal, the probabilities add up correspondingly. 

To a given binary word h = {h\,...h n -\) we associate s := h := (0,1 — h n -\, 1 — 
h n -2, . . . , 1 — hi). 

Definition 5. Fix the search word b. For k > and < i < n we let Pk{i) '■= Pr(Xk+j = 
n\Xj = i). Especially, one has Pk{0) = Pk and Pk{n) = 1. 

Theorem 6. Consider a binary word h = (hi, . . . h n -i). The probabilities Pk = Pk{h, L) of 
the markov chain X(h) associated to the word h are the same as the numbers Pk = Pk(h, L) 
defined recursively in (fj]j and (TJP for h. Thus, the probabilities Pk = Pk{b,L) of finding a 
word b with bifix pattern h in a random word of length k is the same as the probabilities 
Pk = Pk(h,L) for the corresponding markov chain X(h). 

Proof. We have P k = ^P k -\ + for k > 1 or, equivalently, P k (l) = LP k+1 — (L — 

l)Pk for k > and on the other hand 

PA I ) = -— + > ———Pi, ^i + > ir r 'f. hn ~i = 1 for k > n - 1. 
7 - " 7 - ' ' ' r> ((1) if h n -i = 

One can see this equation as follows. With probability j4=t, the state 1 changes directly in 
n — 1 steps to n. Otherwise the state increases by one exactly i — 1 times in direct sequence 
and subsequently jumps to (first summation) or to Sj = 1 — (second summation). 
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Equating the right-hand sides of the last two equations, one gets 

1 n_1 1 

LP k+1 — (L — l)P k = — + £ _((L - 2)P fe _ l 

i=i 

+ P fc _ i (l) +/i n _i (P^-^(l))^ 

LP k . l+1 - { L-l)P k ^ Pfe _. _ LPfe _. +1 + (L — l)P fc _,) 

V v ' 

£(P fe _i-P fc _ l+ l) 

1 1 

= rn-1 JJi^~^' k ^ i LPk-i+l + h n _iL(P k _i — Pk-i+i) 

i=l 



11 1 1 

~ r + T-^^fc ~~ y^r i -Pjfc-n+i + Tihn-iL(Pk~i — Pk-i+i 



L n-l L L n-1 

i=l 

and, after dividing by L and substituting n — i for z, 

1 L — 1 1 1 i 

-Pfe+i = y- H 7 — -Pfc + y-Pfc - j-^Pk-n+i + 2J -jjh n -i(P k -i - Pk-i+i) 

1 1 n-l 1 

= 7- + Pk ~ j^Pk+i-n ~ h iJ^i( P k-n+i+i ~ Pk-n+i) for any k > n, 

i=i 

This is in fact the same as (JIJ). Obviously, it also holds (J5J). 



□ 



The reasoning towards the remainder of this section is chosen in a rather elementary way, 
avoiding any use of more than necessary probability theory. Another way would be to use 
the notions of stochastic domination and couplings between random variables that appears 
to arise quite naturally in the given situation. However, we did not take this route as it 
would actually not shorten the argument significantly. 

Lemma 7. Pk{i) is monotonely increasing in k, i.e. 

Pk+i (i) > P h (i) for k > 0, < % < n. 

Proof. This is just because Xk = n implies Xk+\ = n. (Almost surely, but we will skip that 
specification in general.) □ 

Lemma 8. It holds Pk{i) > if and only if k + % > n. 

Proof. One has Pk{i) > for k + i > n. On the other hand, if k + i < n, then P k (i) = 
because in general Xk+i < X k + 1, so if Xq = i then Xk < % + k < n. 

□ 

Lemma 9. Pk{i) is monotonely increasing in i, i.e. 

Pk(i + 1) > Pk(i) for k > 0, < % < n - 1. 
The inequality is strict if, in addition, k + i + 1 > n, otherwise both sides equal 0. 
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Proof. We proceed by induction on k. k = is trivial because then only i = n — 1 fulfills the 
requirement that k + i + 1 > n, and then Po(n) = 1 and Po{n — 1) = 0. For the induction 
step, we have 

P k (i) = \ p k-i{i + 1) + \Pk-M) + ^^P fe -i(0) 

< jP k -i{i + 1) + + 1) = Pk-i{i + 1) < W + 1), 

where the first inequality is by induction hypothesis - note that < z + 1 and < z + 1 
- and the last one is by lemma [7J In addition, if k + z > n, the first inequality is strict, 
by induction hypothesis. If instead k + i + 1 = n, the last inequality is strict because then 
P k _ x (i + 1) = but P k (i + 1) > by Lemma EJ □ 

Theorem 10. (comparison of Markov chains) Suppose we are given two n-ary words s = 
(so,...s n _i) and s' = (s' , . . . which fullfill (0|). Assume further that s > s' , i.e. 

component-wise s > s' but s ^ s' . Then for the associated Markov chains X(s) and X'(s) := 
X(s') we have Pk > P' k for all k > ko := n + 1 + min{z — Si\0 < i < n — 1 with Si > s^} and 
Pk = Pk f or a tt k < ko- 

Proof. Fix i* with i* — Si* = min{z — Si\0 < i < n — 1 with > s^}. 
More generally than the theorem, we show that Pk{i) > P'k{i) and that 
P k {i) > Pk(i) if k + i > n + 1 + i* - and i < i* and that 

Pk{i) = P'k{i) if k + i < n + 1 + i* — Sj*. (For the remaining cases we make no statement 
about whether the inequality is strict.) 

We proceed by induction on k. The case k = is trivial. For the induction step, we have 

P k {i) = jP k -i(i + 1) + jPk-i(si) + ^^P fe -i(0) 

and 

PL® = i p U(i + 1) + \p' k _M) + ^PLM- 

Now we compare the summands on the right-hand side: Pk-\{i + 1) > P'k-i^ + 1) an( i 
-Pfe-i(O) > -Pfe_i(0) by the induction hypothesis and Pfe_i(si) > Pfe_i(X) by the induction 
hyptothesis, Sj > and Lemma [9j 

For the strictness statement if k + z > n + 1 + z* — Si* and z < i*, note that Pk-i(i + 1) > 
P k _i{i + 1) if z < z* by induction hypothesis, while if z = i* then Sj > and so Pk-i(si) > 
Pk-i( s i) > Pk-i( s 'i) by the induction hypothesis and Lemma M as k — l + Si > n + i* —i = n. 

To show .Pfc(z) = -Pfc(z') if k + i < n + 1 + i* — s^ , again we proceed by induction. For the 
induction step, note that the summands on the right-hand side of the recursion mutually 
agree. If = s- this is by induction hypothesis, otherwise one notes additionally that 
k + i < n + 1 + i* — s^ < n + 1 + z — Sj, so k — 1 + Sj < n, so Pk-i(si) = by Lemma[8l □ 

Now we are in a position to proof Theorem [3] concluding our work. 

Proof of theorem [3 An argument for part a) of the theorem was already mentioned at the 
beginning of section 2. Now for part b), we have the words b and b', their corresponding 
bifix patterns h and h! and define s := h and s' := h! . s and s' now fulfill the requirement 
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of theorem [TUJ The two theorems treat probabilities Pk that are equal by theorem [6] and 
numbers ko which are equal because of 

min{l < i < n — l\hi = and h[ = 1} = 1 + min{i — Si\0 < i < n — 1 with Sj > s-}. 

Hence, we can use the implication of theorem [10] that Pk > P' k for all k > ko and Pk = P' k 
for all k < ko- which applies also for theorem [3] and we are done. 

□ 

4. Summary of notation 

The following table is just thought as a guide for the reader. Our text does not completely 
follow it, especially for indices like n, k and so on. 



L 


size of the alphabet 


b 


word that is searched for 


n 


length of b 


h 


bifix indicator 


d 


(finite) random word or (infinite) random sequence 


k 


length of a random word d or 
index until which an infinite random sequence d is observed 


P k 


probability of b being a subword in a random word of length k 


Pk 


probability of b being a subword in a random word of length k 

only at the end 




a word from {0, . . . ,n — l}t°'-' n_1 } ) will be derived from h 


X(s) 


a Markov chain derived from s (and thus from h) 


Vij 


transition probability for a Markov chain form % to j 


Pk{i) 


probability to get form % to n in at most k steps 


h', P'ki x 'k 


are derived from b' the same way as h, Pk, Xk from b. 
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